July 
2013 




1 1 vi ■ I ^ Volume 3, Issue 7 



ISSN: 2249-0558 



PRESENTING A NEW METHOD FOR RECOGNITION OF 
FARSI HANDWRITTEN DIGITS 



Navid Samimi Behbahan* 
Amin Samimi Behbahab* 
Milad Samimi Behbahan* 



Abstract 

In this article, changes in the slope of Persian handwritten digits have been used to describe each 
digit. The main purpose is to present a method for online rather than offline recognition of 
handwritten digits. In this method, after several pre-processing steps, the changes of slop of each 
digit are calculated from the left bottom corner with equal distance, and it is continued along the 
present curvature. These features are used as the input for HMM algorithm and in order to 
compare the similarities of each sample with the related references. Finally, the system, designed 
on HODA database, cited by the researchers, has been applied. The data set includes 1300 
samples, that 91.8% as correct recognition has been obtained. 
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1 . Introduction 



Research on the recognition of English handwritten digits and letters has been commenced about 
50 years ago; also the history of research regarding Persian and Arabic characters appears to be 
about 29 years ago. Now, the recognition of handwritten digits has many applications as reading 
the amount of banking check, license plate, zip code, and other information of forms. So many 
research have been done on the recognition of Persian handwritten digits which are based on 
features extraction methods and using trainable classification. Feature extraction is proposed to 
reduce the dimension of input data. In order to recognize letters and digits, zonal features [1], 
geometric moments [2], Zernike moments [3], Fourier descriptors [4], static moments [5], 
histogram display and places features [6] have been proposed. Selection of type of feature 
depends on application. Usually, having done an experimental evaluation of the concerning data, 
the more appropriate feature is identified. 

In this paper, the feature extracted for the recognition of digits is the slope changes. First, a 
number of digits of each group are selected as an indicator to make the related HMM model, so 
that various Persian features which are common in public are preserved. Then, each digit is made 
thin and we select 32 turn on pixels as the indicator of each digit with the interval of some turn 
off pixels. Then, the slope of each two successive turn on pixels is calculated, and in this way we 
find slope changes between two lines. This applies to all experimental and training samples. 
More, in the second part, the matter of how the references have been selected is discussed. In the 
third part, three pre-processing steps have been done on the samples and references which are 
very important. In the fourth part, Hidden Markov Model is presented and parameters 
initialization method is reviewed. In the fifth part, the results of implementations are described. 
The final part is allocated to conclusion. 



2. Selecting Patterns for Making Hidden Markov Models 

Selecting reference for Persian handwritten digits should be done based on their diversity. For 
example, the digit four has more diversity than digit one, so it requires more references. 
Therefore, having selected appropriate references for each digit, we could achieve higher 
recognition rate for that digit. Another problem is that although the more the number of 
references for each digit is, the more the probability of the fact that most of input forms related to 
that class are identified correctly would be, on the other hand, the probability that these 

A Monthly Double-Blind Peer Reviewed Refereed Open A ccess Interna tional e-Journal - Included in the International Serial Directories 
Indexed & Listed at: Ulrich's Periodicals Directory ©, U.S.A., l»MjiPBtffiE j as well as in Cabell's Directories of Publishing Opportunities, U.S.A. 

International Journal of Management, IT and Engineering 
http://www.ijmra.us 



251 



July 
2013 



Volume 3, Issue 7 



ISSN: 2249-0558 



references attract input forms related to other digits gets higher, and therefore it causes the 
decrease of recognition rate. Therefore, we should have optimal selection in this regard. Samples 
of these references are seen in figure 1 . 



Figure 1: Samples of references of handwritten digits 

AT ft o 



3. Pre-processing 

As it is obvious, the coordinates of sampling points, their number and distances, also the 
dimensions of created curves could be drastically changed from one digit to another and from 
one writer to another. In order to minimize the effect of these changes in recognition process, a 
kind of normalization should be done on the set of points taken as sample from each digit. In this 
stage, making the sampled points and digits dimensions uniform has been done. 
The pre-processing stage includes four steps: digit skeleton extraction (thinning), changing the 
thinned image to dotted line, making the points and dimensions uniform. 
3.1. Digits Thinning 

In this phase, canny algorithm has been used for thinning and reaching the skeleton of each digit 
without any making erosion or disconnection. This algorithm reaches a proper skeleton which 
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maintains the original shape of the digit and does not make any fake data. Figure 2 is an instance 
of output of this phase. The output in this phase is only a continuous skeleton. The purpose of 
this phase is to due to remove the problem of difference in thickness of handwritten digits. 
Figure 2: a sample that has been thinned 





3.2. Making Sampled Points Uniform 

The major advantage of this method is its resistance to little changes, simplicity of 
implementation and its high accuracy in recognition. Hence, a more accurate description of 
thinned image can be presented. In order to make the use of Hidden Markov Model (HMM) ease 
and increasing its efficiency, the uniformity of sampled points is necessary. In fact, in this way, 
all digits are resembling with definite number of time sections and are converted into an 
appropriate sequence data of HMM. 

For this purpose, first using third-order Spine, an interpolation is done between every four 
consecutive points of digit curve, and thus an estimate of the digit overall shape is made, so that 
the curve length is equal between the two consecutive points, and therefore, 32 points with equal 
distances are sampled from the resulting curve. Figure 2 show an instance of making sampled 
point's uniform. 

Figure 3: an instance of making sampled point's uniform 




3.3. Making Dataset 

Having extracted data from the previous phase and making a line segment between each two 
consecutive points, we would have 31 line segments. There are 30 angles between these line 
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segments which are called slope changes, thus a number of features as slope changes are the 
representative of each digit. 



4. Classification by Hidden Markov Model 

In order to determine the major group of each digit, Hidden Markov Model (HMM) with the 
following parameters has been used: 

Initial Probability: Initial probability for each phase is first a random number which gets a new 
value by K-means algorithm in the process of its implementation. 

Transfer Distribution (A) & Observations Probability Distribution (B): The required parameters 
for these two distributions are specified based on a training set in the training process, and then 
they are reserved to be used in the model. The initial values for these two distributions have been 
specified using K-means algorithm. Observations probability distribution is a mixed normal 
distribution with two components. 

Observation Length (T): This parameter shows slope changes of a sample. According to 
explanation provided in preprocessing section, 20 cases of slope changes have been considered 
for each sample. 

Number of Phases of the Model (N): This parameter shows the number of phases in Markov 
model, and it has been determined for each digit separately. This number is different for each 
digit based on geometric shape. However, no accurate criterion has been considered for it. For 
example, for digit "1", due to its simple geometric shape, 5 phases and for digit "3", due to its 
greater complexity, 9 phases have been considered. 

Number of Samples: The number of samples used in each training phase of the model is different 
for each group and it depends on the diversity in the way of drawing each group of digits. For 
example, for digit "1" where there is a little diversity in its writing, 15 samples have been used 
for training, while for digit "5" 50 samples have been used. 

Hidden Variables (Qi): Hidden variables for HMM are the same slope changes mentioned in the 
previous part. 

Observations (V): Observations, in this model, is the same categories of digits which are 
specified based on hidden variables and a probable distribution. In order to link observation to 
hidden variables, a mixed normal distribution with two components has been applied, and this 
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digit has been selected just based on an experimental test and on the basis of trial and error until 
a desirable result is achieved. 

Type of Relation between Phases: There are several relations between different modes and 
phases of the model, the simplest form of which is when all modes are in relation with each 
other. This pattern has also been used in the present study. The selection of this type of relation 
has been also done based on a trial and error for finding an appropriate pattern. 



Figure 3: Hidden Markov Model and Parameters Used in its Definition 
( ^=„ HiHrlpn VariaVilp t t 




An HMM is created with its own parameters for each set of digits, using which we could 
recognize category corresponding to each input digit. 

5. Implementation & Experimental Results 

MATLAB software has been used for implementation of recognition program. Also, to 
use HMM, HTK software has been applied. HODA data collection provided by Hossein 
Khosravi (a student of Tarbiyat Modarres University) [7] has been used as the training and 
experimental dataset. In this dataset, for each digit about 10000 samples have been collected in 
average. The information of each figure includes a two-dimensional array in which the value of 
each cell is or 1. Thus the value of one and zero mean respectively white and black pixels. 
First, each digit is thinned, then, 32 black pixels with equal distance are extracted from each digit 
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as a representative, and by connecting these points to each other we would have 3 1 line segments 
that the angle between the line segments makes 30 features (changes in slope). 

The average of 30 samples has been extracted from training set, then HMM models have 
been trained and the related parameters have been saved. 100 samples have been used for each 
digit in testing phase. Efficiency matrix of this test has been shown in table 1. As it is observed, 
recognition is correctly done with average of 91.8%. As to digits like and 5, due to their great 
similarity in structure, more interaction is seen. 



Table 1 : Final Test Efficiency Matrix 
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6. Result & Conclusion 

As you know, online recognition rate of handwritten digits is usually higher than its offline 
method. In this article, a simple method has been presented for recognition of Persian digits, the 
basis of which is the extraction of movement features of digits images. In fact, the goal is to 
achieve online features of digits drawing, while we have just offline images of digit images 
(including coordinates of black and white pixels). Preprocessing steps play a crucial role in 
achieving a desirable result. 

What is appeared based on experimental results is that recognition of handwritten digits using 
HMM method has desirable results. And the reason is that in general HM method has optimal 
results on time series. 
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